Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ST_ClusterIntersectingWin and ST_ClusterWithinWin #721

Closed
wants to merge 11 commits into from

Conversation

pramsey
Copy link
Member

@pramsey pramsey commented Feb 24, 2023

Access the functionality of ST_ClusterIntersecting and ST_ClusterWithin, but using a window function instead of getting back a big lump of GeometryCollection.

@robe2
Copy link
Member

robe2 commented Feb 25, 2023

Your NEWs note should reference GH721

<funcsynopsis>
<funcprototype>
<funcdef>integer <function>ST_ClusterIntersectingWin</function></funcdef>
<paramdef><type>geometry</type> <parameter>g</parameter></paramdef>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<paramdef><type>geometry</type> <parameter>g</parameter></paramdef>
<paramdef><type>geometry winset </type> <parameter>geom</parameter></paramdef>

I'd change to geom to be consistent with the other functions, unless you plan to at some point in time make this work for geography, might be a good time to experiment with anyelement :)

The winset is fluff to flag it as a window function so the garden test can test it and it also appears in the 15.2. PostGIS Window Functions section.

<funcsynopsis>
<funcprototype>
<funcdef>integer <function>ST_ClusterWithinWin</function></funcdef>
<paramdef><type>geometry </type> <parameter>g</parameter></paramdef>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<paramdef><type>geometry </type> <parameter>g</parameter></paramdef>
<paramdef><type>geometry winset </type> <parameter>geom</parameter></paramdef>

Same as mentioned in other

NEWS Outdated
@@ -15,6 +15,7 @@ xxxx/xx/xx
- #5286, RenameTopoGeometryColumn (Sandro Santilli)
- GH703, Add min/max resampling as options (Christian Schroeder)
- #5336, topogeometry cast to topoelement support (Regina Obe)
- New window-based ST_ClusterWithinWin and ST_ClusterIntersectingWin (Paul Ramsey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- New window-based ST_ClusterWithinWin and ST_ClusterIntersectingWin (Paul Ramsey)
- GH721, New window-based ST_ClusterWithinWin and ST_ClusterIntersectingWin (Paul Ramsey)

Copy link
Member

@robe2 robe2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more changes - the availability should be 3.4.0 not 3.4

LANGUAGE 'c' IMMUTABLE STRICT WINDOW PARALLEL SAFE
_COST_HIGH;

-- Availability: 3.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- Availability: 3.4
-- Availability: 3.4.0

<refsection>
<title>Description</title>

<para>ST_ClusterWithinWin is a window function that returns an integer for each input geometry, where the integer the cluster number the geometry is a member of. Clusters represent a set of input geometries separated by no more than the specified distance. (Distances are Cartesian distances in the units of the SRID.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning that this function is equivalent to ST_ClusterDBSCAN(eps, minPoints:=0) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they fully equivalent? In behaviour as well as performance? If so, perhaps only one is needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ST_ClusterWithin just calls DBSCAN.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's equivalent to ST_ClusterDBSCan, then yah we should not bother adding it. One less function and then just add an addendum to the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbaston

Worth mentioning that this function is equivalent to ST_ClusterDBSCAN(eps, minPoints:=0) ?

Is there a reason you didn't set minPoints as default arg with value 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, seems to me that there could be a faster implementation for ClusterWithin (since unlike DBSCAN it is a deterministic function which doesn't have to be interative). So providing an specific API for it will allow a future optimized implementation, and lets users be specific about what they want to compute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBSCAN with minPoints=0 is deterministic. I'm not sure what you would propose changing about it? I have no opinion on adding a new function or not, I just thought it would be good to point out the equivalency in the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's already special-cased, if that's what you're getting at)

int union_dbscan(LWGEOM** geoms, uint32_t num_geoms, UNIONFIND* uf, double eps, uint32_t min_points, char** in_a_cluster_ret)
{
if (min_points <= 1)
return union_dbscan_minpoints_1(geoms, num_geoms, uf, eps, in_a_cluster_ret);
else
return union_dbscan_general(geoms, num_geoms, uf, eps, min_points, in_a_cluster_ret);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already special-cased

Well, that's a good optimization. And seems like it might actually be the optimal implementation for Cluster-Within? (A bit hard to tell, since the generality makes the algorithm somewhat obscure).

Anyway, the documentation for ST_ClusterDBSCAN has a complex description which does not make it easy to deduce that it is also useful for a deterministic Cluster-Within functionality. So having a separate function and doc page seems ideal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my suggestion was only to document the equivalence.

</para>

<para>Availability: 3.4.0</para>
</refsection>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
</refsection>
<para>&curve_support;</para>
</refsection>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the same token should ClusterIntersecting have this marker? I'm not sure if it means "native arc support" (like distance and area have) or "stroked arc support" (like all the GEOS backed functions have).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would interpret it to mean native arc support, otherwise all functions would have arc support, no? (But I agree, the wording isn't clear one way or the other.) In this case ClusterWithin would have native arc support but ClusterIntersecting uses GEOS so it would not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DBScan is not currently marked as having arc support. add there too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup

<title>Description</title>

<para>ST_ClusterIntersectingWin is a windowing function that builds inter-connecting clusters of geometries that intersect. It is possible to traverse all geometries in a cluster without leaving the cluster. The return value is the cluster number that the geometry argument participates in, or null for null inputs.</para>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be expected to be equivalent to ST_ClusterDBSCAN(0, 0), if not for cases where ST_Distance(a,b) == 0 but ST_Intersects(a, b) is false.

@robe2
Copy link
Member

robe2 commented Feb 27, 2023

@pramsey I investigated the error cirrus errors. It's happening when Cirrus CI is testing dumprestore.

I run into the same issue if I do:

make check RUNTESTFLAGS="-v --dumprestore"

But works fine

make check RUNTESTFLAGS="--extension -v --dumprestore"

I'm going to flip Cirrus CI to test against extension based install, since that's really what we care about these days anyway.

I'll ticket this as a separate issue in postgis_restore.pl

strk pushed a commit that referenced this pull request Feb 27, 2023
Was doing script installed dumprestore check
which causes a failure in #721
Dumprestore Issue references #5348
@strk
Copy link
Member

strk commented Feb 27, 2023

Could you please rebase against current master branch, resolving conflicts, to see if it fixes the --dumprestore test ?

@postgis postgis deleted a comment from robe2 Feb 27, 2023
@robe2
Copy link
Member

robe2 commented Mar 4, 2023

@pramsey Can we close this out since you've already committed it or is there a reason it is still open?

@pramsey pramsey closed this Mar 4, 2023
@pramsey pramsey deleted the master-window-cluster branch May 25, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants